Diabetes Classification Using Machine Learning

Project information

Category: Exploratory Data Analysis
Industry: Healthcare
Data/Study based on: National Institute of Diabetes and Digestive and Kidney Diseases
Project date: 10 June, 2025
Project URL: Github Link
Skills: Python | Machine Learning | EDA | Sci-kit Learn | Domain Research | Data Cleaning & Processing | Feature Selection

Delivered a robust classification model to accurately predict diabetes types, supporting data-driven healthcare decisions.

Developed and evaluated advanced machine learning models including XGBoost, Random Forest, Logistic Regression, and SVM, achieving 91.99% validation accuracy and a 0.9199 F1-score in multi-class diabetes classification on a dataset of 253,680 instances.

Engineered composite features like CardioVascular Risk and Health Severity, enhancing model interpretability and boosting classification accuracy by 8%. Leveraged RandomizedSearchCV for hyperparameter tuning, improving precision and recall by 15%.

Impact: Delivered a scalable and interpretable solution for diabetes classification, enabling healthcare professionals to better predict and manage patient outcomes. Insights from feature engineering provided actionable recommendations for early detection and personalized treatment strategies.

Future Enhancements: Proposed incorporation of deep learning for time-series data to predict progression and severity over time, and dynamic meta-feature selection to improve efficiency and model adaptability for broader healthcare applications.